Measuring preferential attachment for evolving networks 
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A key ingredient of current models proposed to capture the topological evolution of complex net- 
works is the hypothesis that highly connected nodes increase their connectivity faster than their 
less connected peers, a phenomenon called preferential attachment. Measurements on four net- 
works, namely the science citation network, Internet, actor collaboration and science coauthorship 
network indicate that the rate at which nodes acquire links depends on the node's degree, offering 
direct quantitative support for the presence of preferential attachment. We find that for the first 
two systems the attachment rate depends linearly on the node degree, while for the latter two the 
dependence follows a sublinear power law. 

PACS numbers:89.65.-s, 89.75.-k, 05.10.-a 



Modeling the highly interconnected nature of various 
social, biological and communication systems as complex 
networks or graphs has attracted much attention in the 
last few years. |jL]l4|. As for a long time these networks 
were modeled as completely random |l5| , |l6[ |, the recent 
interest is motivated by the increasing evidence that real 
network display short length-scale clustering QJ^] and 
obey unexpected scaling laws HQ, interpreted as signa- 
tures of deviation from randomness. Current approaches, 
using the tools of statistical physics [||j8[|^] search for uni- 
versalities both in the topology of these webs and in 
the dynamics governing their evolution. These efforts 
resulted in a class of models that view networks as evolv- 
ing dynamical systems, rather than static graphs. Most 
evolving network models are based on two ingredi- 

ents Q] : growth and preferential attachment. The growth 
hypothesis suggests that networks continuously expand 
through the addition of new nodes and links between 
the nodes, while the preferential attachment hypothesis 
states that the rate n(fc) with which a node with fc links 
acquires new links is a monotonically increasing function 
of fc. While most versions of such evolving network mod- 
els assume that II(fc) is linear in k {|,|]||], recently sev- 
eral authors proposed that II(fc) could follow a power-law 
p|,^0| . Consequently, the time evolution of the degree fej 
of node i can be obtained from the first order differential 
equation 
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where m is a constant and n(fc) has the form 
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with a > an unknown scaling exponent. For a = 1 
these models reduce to the scale- free model [ I| , for which 
the degree distribution P(k), giving the probability that 
a node has k links, follows P(k) oc fc~ 7 with 7 = 3. As 
Krapvisky, Redner and Leyvraz have shown M , for a < 1 



the degree distribution follows a stretched exponential, 
while for a > 1 a gelation-like phenomenon is expected, 
where a single site links to nearly all other nodes. On 
the other hand the hypothesis (2) raises a series of fun- 
damental questions, that are not yet supported directly 
by experimental data. First, is preferential attachment 
indeed present in real networks? I.e. does indeed II(fc) 
depend on fc, or it is k independent, as assumed both 
by the Erdos-Renyi |ll| or Watts-Strogatz Q models? 
Second, if n(fc) does indeed depend on fc, what is its 
functional form? Is it linear, as assumed in |4|, or follows 
a power-law as suggested in ||[l0)? Could II(fc) follow 
some unknown and yet unexplored functional form? 

Here we offer the first direct attempt to answer these 
questions in quantitative terms by proposing a numerical 
method that allows us to extract the functional form of 
Il(fc) directly from dynamical data on real evolving net- 
works. Our measurements indicate that II(fc) follows a 
power law for all investigated networks. For the Internet 
and the citation network we find a = 1, while for the 
science collaboration network and the Hollywood actor 
network the results indicate sublinear attachment, i.e. 
a < 1. These results offer the crucial missing link for 
modeling the dynamics of complex evolving networks. 

Methods: To measure n(fc) we use computerized data 
on the dynamics of large networks. Consider a network 
for which we know the order in which each node and link 
joins the system. According to (1) and (2) the function 
n(fc) gives the rate at which an existing node with fc 
links acquires new links as the network grows. To mea- 
sure n(fc) we need to monitor to which old node new 
nodes link to, in function of the degree of the old node. 
However, there is an important problem with this simple 
approach: the normalization constant, C(t), depends on 
the time at which a given node joins the system, creat- 
ing unwanted biases in the measurements. To avoid such 
bias we study the attachment of new nodes within a rel- 
atively short time-frame. Consider all nodes existing in 
the system at time To, called "To nodes". Next select a 
group of " Ti nodes" , added between [Ti , T x + AT] , where 
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AT << Ti and Ti > T . When a Ti node joins the sys- 
tem we record the degree k of the T node to which the 
new node links to. The histogram providing the number 
of links acquired by the To nodes with exactly k degree, 
after normalization, gives the Ii(k, To, T\) function. If the 
growing network develops a stationary state, II(fc, T , Ti) 
should be independent of To and T\ , and should depend 
on k only, providing us the II(fc) preferential attachment 
function. As we are forced to use relatively short AT 
intervals, even for large networks with hundreds of thou- 
sands of nodes H(k) has significant fluctuations, partic- 
ularly for large k. To reduce the noise level, instead of 
Il(fc) we study the cumulative function: 



n(k) 



U(k)dk. 



If II(fc) follows (2), we expect that 
n(k) oc k a+1 . 



(3) 



(4) 



Measurements: The method described above can be 
applied to systems for which the order in which the nodes 
are added to the network is known. In this respect we had 
access to four different computerized networks, whose 
main parameters are shown in Table 1. 

(1) In the coauthorship networks of neuro-science (NS) 
the nodes are scientists, two nodes being linked if they 
coauthored a paper together pj| . The database consid- 
ered by us contains paper titles and authors of all relevant 
journals in the field of NS published between 1991-98. 
Similar to other collaboration networks |12| the distribu- 
tion P(k) for NS follows a power-law. Papers published 
between 1991-9x are used to reveal the network topology 
up to the considered 199x year, so that papers published 
in year 199x+l are used to measure II(fc). 

(2) In the citation network the nodes are papers pub- 
lished in 1988 in Physical Review Letters, and links rep- 
resent the citations these articles received. To is chosen 
as the year 1989. 

(3) In the actor network nodes are actors which are 
linked if they played together in a movie. The network 
investigated by us contains all movies and actors from 
1892 up to 1999 |Jl|. We determined n(fc) for actors 
that debuted between 1920 and 1940, i.e. T o =1940. We 
followed the evolution of the new links yearly between 
1942 and 1993. 

(4) For the Internet data we investigated the nodes 
represent Autonomous Systems (AS) and links are direct 
connections between them [jl9]]. The available data fol- 
lows the network evolution from 1997 up to the present 
days. The function II(fc) was determined for the nodes 
existing in year 2000. 

Results: The n(k) functions obtained for the discussed 
databases are shown in Figs. 1 and 2. If preferential at- 
tachment is absent, i.e. n(fc) is independent of fc, we 
expect ft(fc) oc k. In Figs. 1 and 2 we show as continuous 
line the linear fit. In each of the investigated examples 



the increase of is faster than linear, offering direct 
evidence that preferential attachment is present in each 
of the considered systems. Furthermore, we find that the 
curves follow a straight line on a log-log plot, indicating 
that with a good approximation the power law hypothe- 
sis (2) is valid. Note that apart from statistical fluctua- 
tions, the functional form of II(fc) is independent of To, 
supporting the stationary nature of the attachment pro- 
cess. There is some degree of variation, however, when it 
comes to the value of the exponent a. 





FIG. 1. The K,(k) function determined numerically for the 
citation network (a) and the Internet (b). In (a) the sym- 
bols from top to bottom correspond to measurements made 
at Ti = 1991 and 1995, respectively. For each curve we used 
To = Ti — 1. In the inset, we show the measured a exponents 
for each studied year which was obtained by fitting the whole 
K,(k) curve. For the Internet (b) n(k) was determined using 
T = 1999 and T = 2000, yielding a = 1.05 best exponent. 
In all measurements AT = 1 year. 




FIG. 2. The n(k) function determined numerically for the 
NS scientific collaboration (a) and actor network (b). In (a) 
the symbols from top to bottom correspond to measurements 
made at Ti = 1996 and 1998, respectively. We have used 
To = Ti—1. In (b) the symbols from top to bottom correspond 
to measurements made at Ti = 1950 and 1960, respectively. 
We used as To nodes, the actors present between 1920 to 
1940. In the inset we plotted the measured a exponents for 
each studied year. In all measurements AT = 1 year. 

On Fig. 1 we present two fc(fc) curves for the cita- 
tion network and for the Internet. For both networks 
we find that the slope of n(k) is very close to two, shown 
as dashed line on the figure. For the Internet where the 
measurement was performed for only one year, we obtain 
a = 1.05, while for the citation network we determined 
k(/c) for eight different years, obtaining the set of a val- 
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ues shown in the Inset, indicating < a >= 0.95 ±0.1. 
Thus we conclude that for these two networks the linear 
(a = 1) preferential attachment hypothesis offers a good 
approximation. 

On the other hand, for the scientific collaboration and 
actor networks, we find a < 1 (Fig. 2) The set of a 
values for these networks are summarized in the insets 
of Fig. 2. On average we get < a >— 0.81 ± 0.1 for the 
actor network, and < a >— 0.79 ±0.1 for the scientific 
collaboration networks. 

Internal and external links: The observed sublinear be- 
havior predicts that P{k) for the systems shown in Fig. 2 
should follow a stretched exponential ||. Nevertheless, 
the measured P(k) indicate that a power law offers a 
better fit. How can we than reconcile the nonlinear form 
of II(fe) with the measured P{k)l A potential resolution 
of this conflict lies in the presence of internal links. For 
the scientific coauthorship network or the actor web links 
appear not only from new nodes added to the network, 
but as a result of new links connecting previously exist- 
ing nodes as well. The method presented above allows 
us to investigate separately the attachment mechanisms 
of these distinct type of links. For is, when determin- 
ing n(fc) we first limit the measurements only to external 
links, i.e. links that have been added to the system as a 
result of the appearance of new nodes. Second, we focus 
only on new internal links, i.e. new links that connect 
two previously present but disconnected nodes. For ex- 
ample, such internal link appears when two researchers 
who have not published jointly before, coauthor their first 
paper together or two actors, who did not act together 
before, are joined in a new movie. In general, preferen- 
tial attachment implies that the probability that a new 
internal link appears between two nodes with fci and k 2 
degree scales with the k\k 2 product |11|. Focusing on 
the actor network we find that both external and inter- 
nal links follow preferential attachment. However, the 
exponent a differs for the two type of links. 
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FIG. 3. Preferential attachment of new nodes (a) and new 
internal links (b) in the actor network. In (b) we plot the 
«(fcifej) function. Scaling of is illustrated for a few se- 
lected years. We obtain 0.7 < a < 1 for the PA of new nodes 
(a) and a very close to one for the internal links in the year 
chosen in figure (b). 

From Fig. 3a we conclude that new incoming nodes 
tend to link to the already existing nodes, following the 



functional form (2) with a < 1. The n(kik2) function de- 
termined from the internal links also follows a power-law. 
The scaling is clearly not sublinear, and the exponent de- 
termined from the assymptotic behavior is close to two 
(Fig. 3b). Thus the results indicate that the placement 
of the internal links is also governed by preferential at- 
tachment, which scales linearly with fc. Similar results 
(not shown) have been obtained for the scientific collab- 
oration web in neuroscience and mathematics |ll]] . Note 
that for the science citation network internal links are 
not allowed, and the data resolution for the Internet does 
not allows to perform the same analysis at this point. We 
find that while the preferential attachment of the external 
links is clearly sublinear, the internal links follow a close 
to linear behavior. As in both the actor network and the 
scientific collaboration network the number of internal 
links far out-weight the external links, we believe that in 
the asymptotic limit the internal attachment is the one 
that drives the shape of the P(k) distribution, eventually 
being responsible for its power-law form. These measure- 
ments raise several interesting possibilities for the analyt- 
ical treatment of the complex coexistence of internal and 
external links, that could shed further light on the evo- 
lution of complex networks. 

Initial attractiveness: Dorogovtsev, Mendes and 
Samukhin have recently suggested that in order to ac- 
count for the fact that even nodes with no links can ac- 
quire links, n(fc) should have an additive term, fco, called 
initial attractiveness Q so that, n(fc) oc fco + k a . For 
a = 1 it has been demonstrated the the degree expo- 
nent, 7, depends continuously on fco- In principle, having 
the functional form of n(fc) allows us to determine fco as 
well. We inspected the form of n(fc), which indeed does 
indicate that a nonzero fco is present. On the other hand 
the available statistics was not sufficient to determine un- 
ambiguously the value of fco- In any case, our estimates 
indicate that fco is rather small, in the 10 -6 range, thus 
its presence has no effect on the scaling of n(k) at large 
fc. Nevertheless, the nonzero fco plays an important role 
in starting the evolution of the node connectivity, since 
in its absence no disconnected node could acquire initial 
links. 



Database 


# nodes 


# links 




Citation 


1736 


83252 


0.95 ±0.1 


Internet 


12409 


13445 


1.05 


Collaboration 


209293 


3534724 


0.79 ±0.1 


Actor 


392340 


33646882 


0.81 ±0.1 



Table 1. Summary of the investigated database, showing 
the number of nodes, links, and the average value of the ob- 
tained exponent a. 

In summary, our measurement offer direct confirma- 
tion for the existence of preferential attachment for rather 
different real evolving networks. The emerging picture is 
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://moat .nlanr.net/infrastructure.htmi 
We find that for all the networks we studied Eq. (2) 
gives a good fit for II(A;), implying that II(fc) follows a 
power-law. The exponent a, however, is system depen- 
dent: while for the Internet and the citation network a 
linear H(k) offers a reasonable fit, for the actor network 
and collaboration web the attachment rate is sublincar. 
These results give firmer foundation for the evolving net- 
work models, that have been studied extensively to de- 
scribe the dynamics of real evolving networks. But they 
also pose an important question: what is the microscopic 
origin of preferential attachment? What determines the 
exponent a in general? While some preliminary answers 
have been proposed Jp|[l4| , a good understanding of this 
fundamental question is still lacking. 

We acknowledge useful discussion with R. Albert, I. 
Derenyi, and T. Vicsek. This research was supported by 
NSF, PHY-9988674 and CAREER DMR97-01998. 



more complex, however, than originally assumed in M. [19] 
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